PM 566 Assignment 1

Author

Erica Shin

Step 1

#2002
old <- data.table::fread(file.path("~", "Downloads", "ad_viz_plotval_data.csv"))
old <- as.data.frame(old)

dim(old)
[1] 15976    22
head(old)
        Date Source  Site ID POC Daily Mean PM2.5 Concentration    Units
1 01/05/2002    AQS 60010007   1                           25.1 ug/m3 LC
2 01/06/2002    AQS 60010007   1                           31.6 ug/m3 LC
3 01/08/2002    AQS 60010007   1                           21.4 ug/m3 LC
4 01/11/2002    AQS 60010007   1                           25.9 ug/m3 LC
5 01/14/2002    AQS 60010007   1                           34.5 ug/m3 LC
6 01/17/2002    AQS 60010007   1                           41.0 ug/m3 LC
  Daily AQI Value Local Site Name Daily Obs Count Percent Complete
1              81       Livermore               1              100
2              93       Livermore               1              100
3              74       Livermore               1              100
4              82       Livermore               1              100
5              98       Livermore               1              100
6             115       Livermore               1              100
  AQS Parameter Code AQS Parameter Description Method Code
1              88101  PM2.5 - Local Conditions         120
2              88101  PM2.5 - Local Conditions         120
3              88101  PM2.5 - Local Conditions         120
4              88101  PM2.5 - Local Conditions         120
5              88101  PM2.5 - Local Conditions         120
6              88101  PM2.5 - Local Conditions         120
                     Method Description CBSA Code
1 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS     41860
2 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS     41860
3 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS     41860
4 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS     41860
5 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS     41860
6 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS     41860
                          CBSA Name State FIPS Code      State County FIPS Code
1 San Francisco-Oakland-Hayward, CA               6 California                1
2 San Francisco-Oakland-Hayward, CA               6 California                1
3 San Francisco-Oakland-Hayward, CA               6 California                1
4 San Francisco-Oakland-Hayward, CA               6 California                1
5 San Francisco-Oakland-Hayward, CA               6 California                1
6 San Francisco-Oakland-Hayward, CA               6 California                1
   County Site Latitude Site Longitude
1 Alameda      37.68753      -121.7842
2 Alameda      37.68753      -121.7842
3 Alameda      37.68753      -121.7842
4 Alameda      37.68753      -121.7842
5 Alameda      37.68753      -121.7842
6 Alameda      37.68753      -121.7842
tail(old)
            Date Source  Site ID POC Daily Mean PM2.5 Concentration    Units
15971 12/10/2002    AQS 61131003   1                             15 ug/m3 LC
15972 12/13/2002    AQS 61131003   1                             15 ug/m3 LC
15973 12/22/2002    AQS 61131003   1                              1 ug/m3 LC
15974 12/25/2002    AQS 61131003   1                             23 ug/m3 LC
15975 12/28/2002    AQS 61131003   1                              5 ug/m3 LC
15976 12/31/2002    AQS 61131003   1                              6 ug/m3 LC
      Daily AQI Value      Local Site Name Daily Obs Count Percent Complete
15971              62 Woodland-Gibson Road               1              100
15972              62 Woodland-Gibson Road               1              100
15973               6 Woodland-Gibson Road               1              100
15974              77 Woodland-Gibson Road               1              100
15975              28 Woodland-Gibson Road               1              100
15976              33 Woodland-Gibson Road               1              100
      AQS Parameter Code AQS Parameter Description Method Code
15971              88101  PM2.5 - Local Conditions         117
15972              88101  PM2.5 - Local Conditions         117
15973              88101  PM2.5 - Local Conditions         117
15974              88101  PM2.5 - Local Conditions         117
15975              88101  PM2.5 - Local Conditions         117
15976              88101  PM2.5 - Local Conditions         117
                         Method Description CBSA Code
15971 R & P Model 2000 PM2.5 Sampler w/WINS     40900
15972 R & P Model 2000 PM2.5 Sampler w/WINS     40900
15973 R & P Model 2000 PM2.5 Sampler w/WINS     40900
15974 R & P Model 2000 PM2.5 Sampler w/WINS     40900
15975 R & P Model 2000 PM2.5 Sampler w/WINS     40900
15976 R & P Model 2000 PM2.5 Sampler w/WINS     40900
                                    CBSA Name State FIPS Code      State
15971 Sacramento--Roseville--Arden-Arcade, CA               6 California
15972 Sacramento--Roseville--Arden-Arcade, CA               6 California
15973 Sacramento--Roseville--Arden-Arcade, CA               6 California
15974 Sacramento--Roseville--Arden-Arcade, CA               6 California
15975 Sacramento--Roseville--Arden-Arcade, CA               6 California
15976 Sacramento--Roseville--Arden-Arcade, CA               6 California
      County FIPS Code County Site Latitude Site Longitude
15971              113   Yolo      38.66121      -121.7327
15972              113   Yolo      38.66121      -121.7327
15973              113   Yolo      38.66121      -121.7327
15974              113   Yolo      38.66121      -121.7327
15975              113   Yolo      38.66121      -121.7327
15976              113   Yolo      38.66121      -121.7327
str(old)
'data.frame':   15976 obs. of  22 variables:
 $ Date                          : chr  "01/05/2002" "01/06/2002" "01/08/2002" "01/11/2002" ...
 $ Source                        : chr  "AQS" "AQS" "AQS" "AQS" ...
 $ Site ID                       : int  60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 ...
 $ POC                           : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Daily Mean PM2.5 Concentration: num  25.1 31.6 21.4 25.9 34.5 41 29.3 15 18.8 37.9 ...
 $ Units                         : chr  "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" ...
 $ Daily AQI Value               : int  81 93 74 82 98 115 89 62 69 107 ...
 $ Local Site Name               : chr  "Livermore" "Livermore" "Livermore" "Livermore" ...
 $ Daily Obs Count               : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Percent Complete              : num  100 100 100 100 100 100 100 100 100 100 ...
 $ AQS Parameter Code            : int  88101 88101 88101 88101 88101 88101 88101 88101 88101 88101 ...
 $ AQS Parameter Description     : chr  "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" ...
 $ Method Code                   : int  120 120 120 120 120 120 120 120 120 120 ...
 $ Method Description            : chr  "Andersen RAAS2.5-300 PM2.5 SEQ w/WINS" "Andersen RAAS2.5-300 PM2.5 SEQ w/WINS" "Andersen RAAS2.5-300 PM2.5 SEQ w/WINS" "Andersen RAAS2.5-300 PM2.5 SEQ w/WINS" ...
 $ CBSA Code                     : int  41860 41860 41860 41860 41860 41860 41860 41860 41860 41860 ...
 $ CBSA Name                     : chr  "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" ...
 $ State FIPS Code               : int  6 6 6 6 6 6 6 6 6 6 ...
 $ State                         : chr  "California" "California" "California" "California" ...
 $ County FIPS Code              : int  1 1 1 1 1 1 1 1 1 1 ...
 $ County                        : chr  "Alameda" "Alameda" "Alameda" "Alameda" ...
 $ Site Latitude                 : num  37.7 37.7 37.7 37.7 37.7 ...
 $ Site Longitude                : num  -122 -122 -122 -122 -122 ...
summary(old)
     Date              Source             Site ID              POC       
 Length:15976       Length:15976       Min.   :60010007   Min.   :1.000  
 Class :character   Class :character   1st Qu.:60290014   1st Qu.:1.000  
 Mode  :character   Mode  :character   Median :60590007   Median :1.000  
                                       Mean   :60549600   Mean   :1.581  
                                       3rd Qu.:60731002   3rd Qu.:1.000  
                                       Max.   :61131003   Max.   :6.000  
                                                                         
 Daily Mean PM2.5 Concentration    Units           Daily AQI Value 
 Min.   :  0.00                 Length:15976       Min.   :  0.00  
 1st Qu.:  7.00                 Class :character   1st Qu.: 39.00  
 Median : 12.00                 Mode  :character   Median : 56.00  
 Mean   : 16.12                                    Mean   : 59.28  
 3rd Qu.: 20.50                                    3rd Qu.: 72.00  
 Max.   :104.30                                    Max.   :185.00  
                                                                   
 Local Site Name    Daily Obs Count Percent Complete AQS Parameter Code
 Length:15976       Min.   :1       Min.   :100      Min.   :88101     
 Class :character   1st Qu.:1       1st Qu.:100      1st Qu.:88101     
 Mode  :character   Median :1       Median :100      Median :88101     
                    Mean   :1       Mean   :100      Mean   :88215     
                    3rd Qu.:1       3rd Qu.:100      3rd Qu.:88502     
                    Max.   :1       Max.   :100      Max.   :88502     
                                                                       
 AQS Parameter Description  Method Code  Method Description   CBSA Code    
 Length:15976              Min.   :117   Length:15976       Min.   :12540  
 Class :character          1st Qu.:120   Class :character   1st Qu.:23420  
 Mode  :character          Median :120   Mode  :character   Median :40140  
                           Mean   :297                      Mean   :33270  
                           3rd Qu.:707                      3rd Qu.:41740  
                           Max.   :810                      Max.   :49700  
                                                            NA's   :929    
  CBSA Name         State FIPS Code    State           County FIPS Code
 Length:15976       Min.   :6       Length:15976       Min.   :  1.00  
 Class :character   1st Qu.:6       Class :character   1st Qu.: 29.00  
 Mode  :character   Median :6       Mode  :character   Median : 59.00  
                    Mean   :6                          Mean   : 54.78  
                    3rd Qu.:6                          3rd Qu.: 73.00  
                    Max.   :6                          Max.   :113.00  
                                                                       
    County          Site Latitude   Site Longitude  
 Length:15976       Min.   :32.63   Min.   :-124.2  
 Class :character   1st Qu.:34.07   1st Qu.:-121.4  
 Mode  :character   Median :35.36   Median :-119.1  
                    Mean   :36.00   Mean   :-119.4  
                    3rd Qu.:37.77   3rd Qu.:-117.9  
                    Max.   :41.71   Max.   :-115.5  
                                                    
mean(is.na(old$`Daily Mean PM2.5 Concentration`))
[1] 0
summary(old$`Daily Mean PM2.5 Concentration`)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    7.00   12.00   16.12   20.50  104.30 
#2022
new <- data.table::fread(file.path("~", "Downloads", "ad_viz_plotval_data-2.csv"))
new <- as.data.frame(new)

dim(new)
[1] 59756    22
head(new)
        Date Source  Site ID POC Daily Mean PM2.5 Concentration    Units
1 01/01/2022    AQS 60010007   3                           12.7 ug/m3 LC
2 01/02/2022    AQS 60010007   3                           13.9 ug/m3 LC
3 01/03/2022    AQS 60010007   3                            7.1 ug/m3 LC
4 01/04/2022    AQS 60010007   3                            3.7 ug/m3 LC
5 01/05/2022    AQS 60010007   3                            4.2 ug/m3 LC
6 01/06/2022    AQS 60010007   3                            3.8 ug/m3 LC
  Daily AQI Value Local Site Name Daily Obs Count Percent Complete
1              58       Livermore               1              100
2              60       Livermore               1              100
3              39       Livermore               1              100
4              21       Livermore               1              100
5              23       Livermore               1              100
6              21       Livermore               1              100
  AQS Parameter Code AQS Parameter Description Method Code
1              88101  PM2.5 - Local Conditions         170
2              88101  PM2.5 - Local Conditions         170
3              88101  PM2.5 - Local Conditions         170
4              88101  PM2.5 - Local Conditions         170
5              88101  PM2.5 - Local Conditions         170
6              88101  PM2.5 - Local Conditions         170
                    Method Description CBSA Code
1 Met One BAM-1020 Mass Monitor w/VSCC     41860
2 Met One BAM-1020 Mass Monitor w/VSCC     41860
3 Met One BAM-1020 Mass Monitor w/VSCC     41860
4 Met One BAM-1020 Mass Monitor w/VSCC     41860
5 Met One BAM-1020 Mass Monitor w/VSCC     41860
6 Met One BAM-1020 Mass Monitor w/VSCC     41860
                          CBSA Name State FIPS Code      State County FIPS Code
1 San Francisco-Oakland-Hayward, CA               6 California                1
2 San Francisco-Oakland-Hayward, CA               6 California                1
3 San Francisco-Oakland-Hayward, CA               6 California                1
4 San Francisco-Oakland-Hayward, CA               6 California                1
5 San Francisco-Oakland-Hayward, CA               6 California                1
6 San Francisco-Oakland-Hayward, CA               6 California                1
   County Site Latitude Site Longitude
1 Alameda      37.68753      -121.7842
2 Alameda      37.68753      -121.7842
3 Alameda      37.68753      -121.7842
4 Alameda      37.68753      -121.7842
5 Alameda      37.68753      -121.7842
6 Alameda      37.68753      -121.7842
tail(new)
            Date Source  Site ID POC Daily Mean PM2.5 Concentration    Units
59751 12/01/2022    AQS 61131003   1                            3.4 ug/m3 LC
59752 12/07/2022    AQS 61131003   1                            3.8 ug/m3 LC
59753 12/13/2022    AQS 61131003   1                            6.0 ug/m3 LC
59754 12/19/2022    AQS 61131003   1                           34.8 ug/m3 LC
59755 12/25/2022    AQS 61131003   1                           23.2 ug/m3 LC
59756 12/31/2022    AQS 61131003   1                            1.0 ug/m3 LC
      Daily AQI Value      Local Site Name Daily Obs Count Percent Complete
59751              19 Woodland-Gibson Road               1              100
59752              21 Woodland-Gibson Road               1              100
59753              33 Woodland-Gibson Road               1              100
59754              99 Woodland-Gibson Road               1              100
59755              77 Woodland-Gibson Road               1              100
59756               6 Woodland-Gibson Road               1              100
      AQS Parameter Code AQS Parameter Description Method Code
59751              88101  PM2.5 - Local Conditions         145
59752              88101  PM2.5 - Local Conditions         145
59753              88101  PM2.5 - Local Conditions         145
59754              88101  PM2.5 - Local Conditions         145
59755              88101  PM2.5 - Local Conditions         145
59756              88101  PM2.5 - Local Conditions         145
                                         Method Description CBSA Code
59751 R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC     40900
59752 R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC     40900
59753 R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC     40900
59754 R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC     40900
59755 R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC     40900
59756 R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC     40900
                                    CBSA Name State FIPS Code      State
59751 Sacramento--Roseville--Arden-Arcade, CA               6 California
59752 Sacramento--Roseville--Arden-Arcade, CA               6 California
59753 Sacramento--Roseville--Arden-Arcade, CA               6 California
59754 Sacramento--Roseville--Arden-Arcade, CA               6 California
59755 Sacramento--Roseville--Arden-Arcade, CA               6 California
59756 Sacramento--Roseville--Arden-Arcade, CA               6 California
      County FIPS Code County Site Latitude Site Longitude
59751              113   Yolo      38.66121      -121.7327
59752              113   Yolo      38.66121      -121.7327
59753              113   Yolo      38.66121      -121.7327
59754              113   Yolo      38.66121      -121.7327
59755              113   Yolo      38.66121      -121.7327
59756              113   Yolo      38.66121      -121.7327
str(new)
'data.frame':   59756 obs. of  22 variables:
 $ Date                          : chr  "01/01/2022" "01/02/2022" "01/03/2022" "01/04/2022" ...
 $ Source                        : chr  "AQS" "AQS" "AQS" "AQS" ...
 $ Site ID                       : int  60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 ...
 $ POC                           : int  3 3 3 3 3 3 3 3 3 3 ...
 $ Daily Mean PM2.5 Concentration: num  12.7 13.9 7.1 3.7 4.2 3.8 2.3 6.9 13.6 11.2 ...
 $ Units                         : chr  "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" ...
 $ Daily AQI Value               : int  58 60 39 21 23 21 13 38 59 55 ...
 $ Local Site Name               : chr  "Livermore" "Livermore" "Livermore" "Livermore" ...
 $ Daily Obs Count               : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Percent Complete              : num  100 100 100 100 100 100 100 100 100 100 ...
 $ AQS Parameter Code            : int  88101 88101 88101 88101 88101 88101 88101 88101 88101 88101 ...
 $ AQS Parameter Description     : chr  "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" ...
 $ Method Code                   : int  170 170 170 170 170 170 170 170 170 170 ...
 $ Method Description            : chr  "Met One BAM-1020 Mass Monitor w/VSCC" "Met One BAM-1020 Mass Monitor w/VSCC" "Met One BAM-1020 Mass Monitor w/VSCC" "Met One BAM-1020 Mass Monitor w/VSCC" ...
 $ CBSA Code                     : int  41860 41860 41860 41860 41860 41860 41860 41860 41860 41860 ...
 $ CBSA Name                     : chr  "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" ...
 $ State FIPS Code               : int  6 6 6 6 6 6 6 6 6 6 ...
 $ State                         : chr  "California" "California" "California" "California" ...
 $ County FIPS Code              : int  1 1 1 1 1 1 1 1 1 1 ...
 $ County                        : chr  "Alameda" "Alameda" "Alameda" "Alameda" ...
 $ Site Latitude                 : num  37.7 37.7 37.7 37.7 37.7 ...
 $ Site Longitude                : num  -122 -122 -122 -122 -122 ...
summary(new)
     Date              Source             Site ID              POC       
 Length:59756       Length:59756       Min.   :60010007   Min.   : 1.00  
 Class :character   Class :character   1st Qu.:60290019   1st Qu.: 1.00  
 Mode  :character   Mode  :character   Median :60631006   Median : 3.00  
                                       Mean   :60563315   Mean   : 3.77  
                                       3rd Qu.:60731026   3rd Qu.: 3.00  
                                       Max.   :61131003   Max.   :24.00  
                                                                         
 Daily Mean PM2.5 Concentration    Units           Daily AQI Value 
 Min.   : -6.700                Length:59756       Min.   :  0.00  
 1st Qu.:  4.100                Class :character   1st Qu.: 23.00  
 Median :  6.800                Mode  :character   Median : 38.00  
 Mean   :  8.428                                   Mean   : 39.28  
 3rd Qu.: 10.700                                   3rd Qu.: 54.00  
 Max.   :302.500                                   Max.   :454.00  
                                                                   
 Local Site Name    Daily Obs Count Percent Complete AQS Parameter Code
 Length:59756       Min.   :1       Min.   :100      Min.   :88101     
 Class :character   1st Qu.:1       1st Qu.:100      1st Qu.:88101     
 Mode  :character   Median :1       Median :100      Median :88101     
                    Mean   :1       Mean   :100      Mean   :88192     
                    3rd Qu.:1       3rd Qu.:100      3rd Qu.:88101     
                    Max.   :1       Max.   :100      Max.   :88502     
                                                                       
 AQS Parameter Description  Method Code  Method Description   CBSA Code    
 Length:59756              Min.   :143   Length:59756       Min.   :12540  
 Class :character          1st Qu.:170   Class :character   1st Qu.:31080  
 Mode  :character          Median :170   Mode  :character   Median :40140  
                           Mean   :336                      Mean   :34957  
                           3rd Qu.:707                      3rd Qu.:41860  
                           Max.   :810                      Max.   :49700  
                                                            NA's   :4567   
  CBSA Name         State FIPS Code    State           County FIPS Code
 Length:59756       Min.   :6       Length:59756       Min.   :  1.00  
 Class :character   1st Qu.:6       Class :character   1st Qu.: 29.00  
 Mode  :character   Median :6       Mode  :character   Median : 63.00  
                    Mean   :6                          Mean   : 56.19  
                    3rd Qu.:6                          3rd Qu.: 73.00  
                    Max.   :6                          Max.   :113.00  
                                                                       
    County          Site Latitude   Site Longitude  
 Length:59756       Min.   :32.58   Min.   :-124.2  
 Class :character   1st Qu.:34.07   1st Qu.:-121.4  
 Mode  :character   Median :36.49   Median :-119.6  
                    Mean   :36.24   Mean   :-119.6  
                    3rd Qu.:37.96   3rd Qu.:-117.9  
                    Max.   :41.76   Max.   :-115.5  
                                                    
mean(is.na(new$`Daily Mean PM2.5 Concentration`))
[1] 0
summary(new$`Daily Mean PM2.5 Concentration`)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -6.700   4.100   6.800   8.428  10.700 302.500 
#finding total number of negative daily mean PM2.5 values
length(new[new$`Daily Mean PM2.5 Concentration`<0,'Daily Mean PM2.5 Concentration'])
[1] 215
#215 values

2002 data summary:

For the 2002 dataset, the dimensions are 15,976 rows (observations) by 22 columns (variables).

No apparent data issues.

2022 data summary:

For the 2022 dataset, the dimensions are 59,756 rows (observations) by 22 columns (variables).

The daily mean PM2.5 concentration variable seems to have a negative minimum value of -6.7, which doesn’t make sense. There are a total of 215 observations that have a negative daily mean PM 2.5 concentration value.

Both 2002 and 2022 dataset findings:

Both datasets have three types of variables: character, integer, and numeric.

Character variable names: date, source, units, local site name, AQS parameter description, method description, CBSA name, state, county

Integer variable names: site ID, POC, daily AQI value, daily obs count, AQS parameter code, method code. CBSA code, state FIPS code, county FIPS code

Numerical variable names: daily mean PM2.5 concentration, percent complete, site latitude, site longitude

Step 2

#combining two years into one data frame
both <- rbind(old, new)
dim(both)
[1] 75732    22
#creating new column for year
both$Year <- format(as.Date(both$Date, format="%m/%d/%Y"),"%Y")

#changing names of key variables
names(both)[names(both) == "Daily Mean PM2.5 Concentration"] <- "pm2.5mean"

names(both)[names(both) == "Site Latitude"] <- "lat"

names(both)[names(both) == "Site Longitude"] <- "lon"

Step 3

library(leaflet)

old2 <- both[both$Year == 2002, ]
new2 <- both[both$Year == 2022, ]

#one map with both years
leaflet() %>%
  addProviderTiles('OpenStreetMap') %>% 
  addCircles(
    data = old2,
    lat=~lat,lng=~lon, popup = "2002",
    opacity=1, fillOpacity=1, radius=100, color = "blue") %>%
  addCircles(
    data = new2,
    lat=~lat,lng=~lon, popup = "2022",
    opacity=1, fillOpacity=1, radius=100, color = "red")
#might help to make two maps, one for each year because a lot of the stations haven't moved

#separate map for 2002
leaflet() %>%
  addProviderTiles('OpenStreetMap') %>% 
  addCircles(
    data = old2,
    lat=~lat,lng=~lon, popup = "2002",
    opacity=1, fillOpacity=1, radius=100, color = "blue")
#separate map for 2022
leaflet() %>%
  addProviderTiles('OpenStreetMap') %>% 
  addCircles(
    data = new2,
    lat=~lat,lng=~lon, popup = "2022",
    opacity=1, fillOpacity=1, radius=100, color = "red")

Summary of spatial distribution:

The leaflet maps indicate that there are more data points for 2022 (red) than for 2002 (blue). For both 2002 and 2022, the data points are distributed throughout California, with clusters around Los Angeles and the Bay Area. Compared to the 2022 data, the 2002 data points are sparser - especially in the central area.

Step 4

sum(is.na(both$pm2.5mean))
[1] 0
#there are 0 missing values of PM 2.5

both <- both[!is.na(both$pm2.5mean), ]

both <- both[order(both$pm2.5mean), ]

head(both)
            Date Source  Site ID POC pm2.5mean    Units Daily AQI Value
42912 09/20/2022    AQS 60571001   5      -6.7 ug/m3 LC               0
42911 09/19/2022    AQS 60571001   5      -6.3 ug/m3 LC               0
42913 09/21/2022    AQS 60571001   5      -5.1 ug/m3 LC               0
42896 09/03/2022    AQS 60571001   5      -4.7 ug/m3 LC               0
42914 09/22/2022    AQS 60571001   5      -4.7 ug/m3 LC               0
42897 09/04/2022    AQS 60571001   5      -4.1 ug/m3 LC               0
           Local Site Name Daily Obs Count Percent Complete AQS Parameter Code
42912 Truckee-Fire Station               1              100              88502
42911 Truckee-Fire Station               1              100              88502
42913 Truckee-Fire Station               1              100              88502
42896 Truckee-Fire Station               1              100              88502
42914 Truckee-Fire Station               1              100              88502
42897 Truckee-Fire Station               1              100              88502
                   AQS Parameter Description Method Code
42912 Acceptable PM2.5 AQI & Speciation Mass         733
42911 Acceptable PM2.5 AQI & Speciation Mass         733
42913 Acceptable PM2.5 AQI & Speciation Mass         733
42896 Acceptable PM2.5 AQI & Speciation Mass         733
42914 Acceptable PM2.5 AQI & Speciation Mass         733
42897 Acceptable PM2.5 AQI & Speciation Mass         733
            Method Description CBSA Code                CBSA Name
42912 Met-One BAM W/PM2.5 VSCC     46020 Truckee-Grass Valley, CA
42911 Met-One BAM W/PM2.5 VSCC     46020 Truckee-Grass Valley, CA
42913 Met-One BAM W/PM2.5 VSCC     46020 Truckee-Grass Valley, CA
42896 Met-One BAM W/PM2.5 VSCC     46020 Truckee-Grass Valley, CA
42914 Met-One BAM W/PM2.5 VSCC     46020 Truckee-Grass Valley, CA
42897 Met-One BAM W/PM2.5 VSCC     46020 Truckee-Grass Valley, CA
      State FIPS Code      State County FIPS Code County      lat       lon
42912               6 California               57 Nevada 39.32783 -120.1846
42911               6 California               57 Nevada 39.32783 -120.1846
42913               6 California               57 Nevada 39.32783 -120.1846
42896               6 California               57 Nevada 39.32783 -120.1846
42914               6 California               57 Nevada 39.32783 -120.1846
42897               6 California               57 Nevada 39.32783 -120.1846
      Year
42912 2022
42911 2022
42913 2022
42896 2022
42914 2022
42897 2022
summary(both$pm2.5mean)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  -6.70    4.50    7.60   10.05   12.20  302.50 
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
#dataset with only negative mean PM2.5 values
neg <- both[both$pm2.5mean < 0, ]

summary(neg)
     Date              Source             Site ID              POC      
 Length:215         Length:215         Min.   :60010011   Min.   :1.00  
 Class :character   Class :character   1st Qu.:60292009   1st Qu.:3.00  
 Mode  :character   Mode  :character   Median :60651016   Median :3.00  
                                       Mean   :60614750   Mean   :2.67  
                                       3rd Qu.:60831008   3rd Qu.:3.00  
                                       Max.   :61130004   Max.   :5.00  
                                                                        
   pm2.5mean         Units           Daily AQI Value Local Site Name   
 Min.   :-6.700   Length:215         Min.   :0       Length:215        
 1st Qu.:-0.800   Class :character   1st Qu.:0       Class :character  
 Median :-0.400   Mode  :character   Median :0       Mode  :character  
 Mean   :-0.707                      Mean   :0                         
 3rd Qu.:-0.200                      3rd Qu.:0                         
 Max.   :-0.100                      Max.   :0                         
                                                                       
 Daily Obs Count Percent Complete AQS Parameter Code AQS Parameter Description
 Min.   :1       Min.   :100      Min.   :88101      Length:215               
 1st Qu.:1       1st Qu.:100      1st Qu.:88101      Class :character         
 Median :1       Median :100      Median :88101      Mode  :character         
 Mean   :1       Mean   :100      Mean   :88252                               
 3rd Qu.:1       3rd Qu.:100      3rd Qu.:88502                               
 Max.   :1       Max.   :100      Max.   :88502                               
                                                                              
  Method Code    Method Description   CBSA Code      CBSA Name        
 Min.   :170.0   Length:215         Min.   :12540   Length:215        
 1st Qu.:170.0   Class :character   1st Qu.:37100   Class :character  
 Median :170.0   Mode  :character   Median :40900   Mode  :character  
 Mean   :371.2                      Mean   :36160                     
 3rd Qu.:731.0                      3rd Qu.:42100                     
 Max.   :733.0                      Max.   :47300                     
                                    NA's   :19                        
 State FIPS Code    State           County FIPS Code    County         
 Min.   :6       Length:215         Min.   :  1.00   Length:215        
 1st Qu.:6       Class :character   1st Qu.: 29.00   Class :character  
 Median :6       Mode  :character   Median : 65.00   Mode  :character  
 Mean   :6                          Mean   : 61.33                     
 3rd Qu.:6                          3rd Qu.: 83.00                     
 Max.   :6                          Max.   :113.00                     
                                                                       
      lat             lon             Year          
 Min.   :32.84   Min.   :-124.2   Length:215        
 1st Qu.:34.84   1st Qu.:-122.0   Class :character  
 Median :37.06   Median :-121.1   Mode  :character  
 Mean   :37.04   Mean   :-120.5                     
 3rd Qu.:38.94   3rd Qu.:-118.9                     
 Max.   :41.76   Max.   :-115.5                     
                                                    
library(ggplot2)

#exploring proportion of neg mean PM 2.5 values
neg |> 
  ggplot() +
  geom_bar(mapping=aes(x=pm2.5mean, y=after_stat(prop)))

neg |> 
  ggplot() +
  geom_bar(mapping=aes(x=pm2.5mean, color=Date))

neg$Month <- format(as.Date(neg$Date, format="%m/%d/%Y"),"%m")

neg |> 
  ggplot() +
  geom_bar(mapping=aes(x=pm2.5mean, y=after_stat(prop), color=Month))

neg |> 
  ggplot() +
  geom_bar(mapping=aes(x=Month, y=pm2.5mean), stat="identity")

#shows greatest negative pm2.5mean value comes from September 2022

neg |> 
  ggplot() +
  geom_boxplot(mapping=aes(x=Month, y=pm2.5mean))

#shows widest boxplot comes from September 2022

#want to see if they came from a single day or were evenly distributed across timeframe

There are no missing values for the mean PM 2.5 concentration in the combined dataset.

There are 215 implausible values in the combined dataset. They are negative values, which is implausible since it doesn’t make sense for the mean PM 2.5 concentration to be negative.

Plotting the proportions of the 215 negative/implausible values via barplot shows that most of the values are between -2 and 0. The barplot of negative PM 2.5 mean values shows a left-skewed distribution. Plotting the 215 negative values via boxplot shows that the widest range (of boxplot) comes from September 2022. A bar plot also shows that the greatest negative PM 2.5 mean value comes from September 2022.

Step 5

library(ggplot2)

#removing 215 implausible/negative values
both <- both[both$pm2.5mean >= 0, ]

#state scatter plot
plot(both$Year, both$pm2.5mean, col = factor(both$State))

#plot(old2$pm2.5mean, col = factor(old2$State))
old_hist_state <- hist(old2$pm2.5mean, col = factor(old2$State))

new_hist_state <- hist(new2$pm2.5mean, col = factor(new2$State))

#state geom scatter plot
both[!is.na(pm2.5mean)] |>
  ggplot(data=both, mapping=aes(x=Year, y=pm2.5mean, color=State)) +
  geom_point() +
  geom_smooth()
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

#state boxplot
both[!is.na(pm2.5mean)] |>
  ggplot(data=both, mapping=aes(x=Year, y=pm2.5mean, fill=State)) +
  geom_boxplot()

#county scatter plot
plot(both$Year, both$pm2.5mean, col = factor(both$County))

#county geom scatter plot
both[!is.na(pm2.5mean)] |>
  ggplot(data=both, mapping=aes(x=Year, y=pm2.5mean, color=County)) +
  geom_point() +
  geom_smooth()
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

#county boxplot
both[!is.na(pm2.5mean)] |>
  ggplot(data=both, mapping=aes(x=Year, y=pm2.5mean, fill=County)) +
  geom_boxplot()

#site in LA
both_la <- both[both$County == 'Los Angeles', ]

plot(both$Year, both$pm2.5mean, col = factor(both_la$County))

#all boxplots
ggplot(data=both, aes(x=Year, y=pm2.5mean)) +
  geom_boxplot(aes(fill=State), width=0.8) + theme_bw()

ggplot(data=both, aes(x=Year, y=pm2.5mean)) +
  geom_boxplot(aes(fill=County), width=0.8) + theme_bw()

ggplot(data=both_la, aes(x=Year, y=pm2.5mean)) +
  geom_boxplot(aes(fill=County), width=0.8) + theme_bw()

#boxplot(both$pm2.5mean ~ both$Year, col=factor(both$State))


#take average at the county level. taking average within groups and then put into barplots
cnty <- both |>
  group_by(Year, County) |>
  summarize(avg_pm2.5mean = mean(pm2.5mean, na.rm=TRUE))
`summarise()` has grouped output by 'Year'. You can override using the
`.groups` argument.
ggplot(data=cnty, aes(x=Year, y=avg_pm2.5mean, fillCounty)) +
  geom_boxplot()

State:

Compared to the data from 2002, the 2022 data points have a narrower interquartile range (IQR) and lower median. However, the 2022 data points have a wider overall range with a much higher maximum value (around 300) and lower minimum value (below 0 -> the implausible values). At the state level, the data shows that the daily concentrations of PM 2.5 may have decreased in California over the last 20 years (from 2002 to 2022) but there are more outliers and there is a wider range in 2022.

County:

Compared to the data from 2002, the 2022 data points seem to have lower interquartile ranges (IQR) overall. However, the 2022 data points have a wider overall range with much higher maximum values (around 300) and lower minimum values (below 0 -> the implausible values). At the county level, the data shows that the daily concentrations of PM 2.5 may have decreased in California over the last 20 years (from 2002 to 2022) but there are more outliers and there is a wider range in 2022.

Site in Los Angeles:

Compared to the data from 2002, the 2022 data points have a narrower interquartile range (IQR) and lower median. Also, the 2002 data points have a wider overall range with higher maximum values (around 80) while the 2022 data points have slightly lower minimum values. For sites in Los Angeles, the data shows that the daily concentrations of PM 2.5 have generally decreased in California over the last 20 years (from 2002 to 2022).